All code and higher resolution images for this project can be found on GitHub at https://github.com/erickabsmith/masters-project-lcz-classification.
3/9/2021
All code and higher resolution images for this project can be found on GitHub at https://github.com/erickabsmith/masters-project-lcz-classification.
Local Climate Zone classes. Originally from Stewart and Oke (2012) and remade by Bechtel et al. (2017). Copyright CC-BY 4.0
Yoo
The LCZ reference data
The Landsat 8 data
All 9 available bands of all 4 Landsat scenes amounted to 36 input variables. Each pixel is an observation,
Splits are typically evaluated by Gini impurity or entropy:
\[ \text{Gini Impurity} =\ I_G(t)\ = 1 - \sum_{i=1}^{C}p(i|t)^2 \] \[ \text{Entropy} =\ I_H(t)\ = -\sum_{i=1}^{C}p(i|t)\log_2p(i|t) \]
Where \(i\) is a class in the predictor variable, ranging from 1 to \(C\). \(C\) is the total number of classes represented for a particular node, \(t\). \(p(i|t)\) is the proportion of samples that belong to each \(i\), for a particular node \(t\).
In line with the methods used in our reference paper and the remote sensing field, accuracy metrics will include the following:
\[ \text{Overall Accuracy}= OA= \frac{\text{number of correctly classified reference sites}}{\text{total number of reference sites}} \]
\(OA_{urb}\) and \(OA_{nat}\) will be used, which are the same as overall \(OA\) but only includes the urban and natural classes, respectively.
\[ UA(z)\ = \frac{\text{number of correctly identified pixels in class z}}{\text{total number of pixels identified as class z}} \] \[ PA(z) = \frac{\text{number of correctly identified pixels in class z}}{\text{number of pixels truly in class z}} \]
\(UA\) is a measure of user’s accuracy, which is also called precision or positive predictive value. \(PA\) is the measure of producer’s accuracy, also known as recall or sensitivity. The harmonic mean of \(UA\) and \(PA\) gives the \(F_1\) score, which is a measure of the model’s accuracy. An \(F_1\) Score closer to 1 indicates a model that has both low false positives and low false negatives.
\[ F_1\text{ Score} = 2*\frac{UA*PA}{UA+PA} \]
The parameter for the number of trees was initially varied between 5 and 500 at intervals of 5. The resulting overall accuracy metrics indicate a leveling off around 125 trees (Figure 2). There’s also a clear distinction between accuracy in urban vs. natural classes, with natural classes having a much higher overall accuracy.
The increase in OA metrics levels off around 125 trees. Urban classes (1-10) have much lower accuracy than natural classes (11-17). These metrics were calculated based on the out-of-bag dataset.
The variation between LCZ classes in F-1 score can be seen. As the number of trees in the random forest increases, F-1 score also increases, until around 100 trees. These metrics were calculated based on the out-of-bag dataset.
OA and F-1 metrics dropped dramatically upon applying the random forest to the test data (Figure 4).
Accuracy among random forest predictions for the test dataset varied widely, but was lower than expected for F-1 scores, which do not seem to agree with the OA metrics. Classes 2, 5, 8, and 14 have particularly low F-1 Scores
There is not a clear pattern in Mean Decrease for Gini Impurity between the different bands and scenes, though there is some indication that bands in scene 4 were particularly effective as predictors.
Imagery of the area of interest. Each has a basemap of satellite reference imagery. Top Left: Only satellite reference. Top Right: One Landsat 8 Scene. Bottom: A fully predicted LCZ map.